Goto

Collaborating Authors

 star model


STAN: Smooth Transition Autoregressive Networks

Inzirillo, Hugo, Genet, Remi

arXiv.org Artificial Intelligence

Traditional Smooth Transition Autoregressive (STAR) models offer an effective way to model these dynamics through smooth regime changes based on specific transition variables. In this paper, we propose a novel approach by drawing an analogy between STAR models and a multilayer neural network architecture. Our proposed neural network architecture mimics the STAR framework, employing multiple layers to simulate the smooth transition between regimes and capturing complex, nonlinear relationships. The network's hidden layers and activation functions are structured to replicate the gradual switching behavior typical of STAR models, allowing for a more flexible and scalable approach to regime-dependent modeling. This research suggests that neural networks can provide a powerful alternative to STAR models, with the potential to enhance predictive accuracy in economic and financial forecasting.


Do Deep Neural Network Solutions Form a Star Domain?

Sonthalia, Ankit, Rubinstein, Alexander, Abbasnejad, Ehsan, Oh, Seong Joon

arXiv.org Artificial Intelligence

It has recently been conjectured that neural network solution sets reachable via stochastic gradient descent (SGD) are convex, considering permutation invariances (Entezari et al., 2022). This means that a linear path can connect two independent solutions with low loss, given the weights of one of the models are appropriately permuted. However, current methods to test this theory often require very wide networks to succeed. In this work, we conjecture that more generally, the SGD solution set is a "star domain" that contains a "star model" that is linearly connected to all the other solutions via paths with low loss values, modulo permutations. We propose the Starlight algorithm that finds a star model of a given learning task. We validate our claim by showing that this star model is linearly connected with other independently found solutions. As an additional benefit of our study, we demonstrate better uncertainty estimates on the Bayesian Model Averaging over the obtained star domain. Further, we demonstrate star models as potential substitutes for model ensembles. Our code is available at https://github.com/aktsonthalia/starlight.


Semiparametric count data regression for self-reported mental health

Kowal, Daniel R., Wu, Bohan

arXiv.org Machine Learning

"For how many days during the past 30 days was your mental health not good?" The responses to this question measure self-reported mental health and can be linked to important covariates in the National Health and Nutrition Examination Survey (NHANES). However, these count variables present major distributional challenges: the data are overdispersed, zero-inflated, bounded by 30, and heaped in five- and seven-day increments. To meet these challenges, we design a semiparametric estimation and inference framework for count data regression. The data-generating process is defined by simultaneously transforming and rounding (STAR) a latent Gaussian regression model. The transformation is estimated nonparametrically and the rounding operator ensures the correct support for the discrete and bounded data. Maximum likelihood estimators are computed using an EM algorithm that is compatible with any continuous data model estimable by least squares. STAR regression includes asymptotic hypothesis testing and confidence intervals, variable selection via information criteria, and customized diagnostics. Simulation studies validate the utility of this framework. STAR is deployed to study the factors associated with self-reported mental health and demonstrates substantial improvements in goodness-of-fit compared to existing count data regression models.


A Simultaneous Transformation and Rounding Approach for Modeling Integer-Valued Data

Kowal, Daniel R., Canale, Antonio

arXiv.org Machine Learning

Integer-valued and count data are ubiquitous in many fields, including epidemiology (Osthus et al., 2018; Kowal, 2019), ecology (Dorazio et al., 2005), and insurance (Bening and Korolev, 2012), among others (Cameron and Trivedi, 2013). Count data also serve as an indicator of demand, such as the demand for medical services (Deb and Trivedi, 1997), emergency medical services (Matteson et al., 2011), and call center access (Shen and Huang, 2008). In these applications and many others, integer-valued data are frequently observed jointly with predictors, over time intervals, or across spatial locations. Integer-valued data also exhibit a variety of distributional features, including zero-inflation, skewness, over-or underdispersion, and in some cases may be bounded or censored. Flexible and interpretable models for integervalued processes are therefore highly useful in practice. The most widely-used models for count data build upon the Poisson distribution. However, the limitations of the Poisson distribution are well-known: the distribution is not sufficiently flexible in practice and cannot account for zero-inflation or over-and underdispersion. A common strategy is to generalize the Poisson model by introducing additional parameters.


Sparse Tensor Additive Regression

Hao, Botao, Wang, Boxiang, Wang, Pengyuan, Zhang, Jingfei, Yang, Jian, Sun, Will Wei

arXiv.org Machine Learning

In such applications, a fundamental statistical tool is tensor regression, a modern high-dimensional regression method that relates a scalar response to tensor covariates. For example, in neuroimaging analysis, an important objective is to predict clinical outcomes using subjects' brain imaging data. This can be formulated as a tensor regression problem by treating the clinical outcomes as the response and the brain images as the tensor covariates. Another example is in the study of how advertisement placement affect users' clicking behavior in online advertising. This again can be formulated as a tensor regression problem by treating the daily overall click-through rate (CTR) as the response and the tensor that summarizes the impressions (i.e., view counts) of different advertisements on different devices (e.g., phone, computer, etc.) as the covariate. In Section 6, we consider such an online advertising application.